AITopics | gec system

Collaborating Authors

gec system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator

Sakai, Yusuke, Goto, Takumi, Watanabe, Taro

arXiv.org Artificial IntelligenceJun-4-2025

We propose IMPARA-GED, a novel reference-free automatic grammatical error correction (GEC) evaluation method with grammatical error detection (GED) capabilities. We focus on the quality estimator of IMPARA, an existing automatic GEC evaluation method, and construct that of IMPARA-GED using a pre-trained language model with enhanced GED capabilities. Experimental results on SEEDA, a meta-evaluation dataset for automatic GEC evaluation methods, demonstrate that IMPARA-GED achieves the highest correlation with human sentence-level evaluations.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.02899

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Japan > Honshū (0.28)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Rethinking Evaluation Metrics for Grammatical Error Correction: Why Use a Different Evaluation Process than Human?

Goto, Takumi, Sakai, Yusuke, Watanabe, Taro

arXiv.org Artificial IntelligenceFeb-13-2025

One of the goals of automatic evaluation metrics in grammatical error correction (GEC) is to rank GEC systems such that it matches human preferences. However, current automatic evaluations are based on procedures that diverge from human evaluation. Specifically, human evaluation derives rankings by aggregating sentence-level relative evaluation results, e.g., pairwise comparisons, using a rating algorithm, whereas automatic evaluation averages sentence-level absolute scores to obtain corpus-level scores, which are then sorted to determine rankings. In this study, we propose an aggregation method for existing automatic evaluation metrics which aligns with human evaluation methods to bridge this gap. We conducted experiments using various metrics, including edit-based metrics, $n$-gram based metrics, and sentence-level metrics, and show that resolving the gap improves results for the most of metrics on the SEEDA benchmark. We also found that even BERT-based metrics sometimes outperform the metrics of GPT-4. We publish our unified implementation of the metrics and meta-evaluations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.09416

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Data Science > Data Quality > Data Cleaning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction

Goto, Takumi, Vasselli, Justin, Watanabe, Taro

arXiv.org Artificial IntelligenceDec-17-2024

Various evaluation metrics have been proposed for Grammatical Error Correction (GEC), but many, particularly reference-free metrics, lack explainability. This lack of explainability hinders researchers from analyzing the strengths and weaknesses of GEC models and limits the ability to provide detailed feedback for users. To address this issue, we propose attributing sentence-level scores to individual edits, providing insight into how specific corrections contribute to the overall performance. For the attribution method, we use Shapley values, from cooperative game theory, to compute the contribution of each edit. Experiments with existing sentence-level metrics demonstrate high consistency across different edit granularities and show approximately 70\% alignment with human evaluations. In addition, we analyze biases in the metrics based on the attribution results, revealing trends such as the tendency to ignore orthographic edits. Our implementation is available at \url{https://github.com/naist-nlp/gec-attribute}.

computational linguistic, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.1311

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.63)
Information Technology > Data Science > Data Quality > Data Cleaning (0.63)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

LLM-based Code-Switched Text Generation for Grammatical Error Correction

Potter, Tom, Yuan, Zheng

arXiv.org Artificial IntelligenceOct-14-2024

With the rise of globalisation, code-switching (CSW) has become a ubiquitous part of multilingual conversation, posing new challenges for natural language processing (NLP), especially in Grammatical Error Correction (GEC). This work explores the complexities of applying GEC systems to CSW texts. Our objectives include evaluating the performance of state-of-the-art GEC systems on an authentic CSW dataset from English as a Second Language (ESL) learners, exploring synthetic data generation as a solution to data scarcity, and developing a model capable of correcting grammatical errors in monolingual and CSW texts. We generated synthetic CSW GEC data, resulting in one of the first substantial datasets for this task, and showed that a model trained on this data is capable of significant improvements over existing systems. This work targets ESL learners, aiming to provide educational technologies that aid in the development of their English grammatical correctness without constraining their natural multilingualism.

artificial intelligence, dataset, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.10349

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(6 more...)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Grammatical Error Feedback: An Implicit Evaluation Approach

Bannò, Stefano, Knill, Kate, Gales, Mark J. F.

arXiv.org Artificial IntelligenceAug-18-2024

Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a novel implicit evaluation approach to GEF that eliminates the need for manual feedback annotations. Our method adopts a grammatical lineup approach where the task is to pair feedback and essay representations from a set of possible alternatives. This matching process can be performed by appropriately prompting a large language model (LLM). An important aspect of this process, explored here, is the form of the lineup, i.e., the selection of foils. This paper exploits this framework to examine the quality and need for GEC to generate feedback, as well as the system used to generate feedback, using essays from the Cambridge Learner Corpus.

computational linguistic, information, lexical information, (15 more...)

arXiv.org Artificial Intelligence

2408.09565

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Czechia > Prague (0.05)
North America > Mexico > Mexico City > Mexico City (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Ye, Jingheng, Xu, Zishan, Li, Yinghui, Cheng, Xuxin, Song, Linlin, Zhou, Qingyu, Zheng, Hai-Tao, Shen, Ying, Su, Xin

arXiv.org Artificial IntelligenceJun-30-2024

The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review.

cleme2, correction, error correction, (15 more...)

arXiv.org Artificial Intelligence

2407.00934

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.84)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction

Wang, Yixuan, Wang, Baoxin, Liu, Yijun, Wu, Dayong, Che, Wanxiang

arXiv.org Artificial IntelligenceMar-26-2024

Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task. Recent work using model ensemble methods based on voting can effectively mitigate over-correction and improve the precision of the GEC system. However, these methods still require the output of several GEC systems and inevitably lead to reduced error recall. In this light, we propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble. Specifically, we train the model on an over-correction dataset constructed through the proposed K-fold cross inference method, which allows it to directly generate filtered sentences by combining the original and the over-corrected text. In the inference stage, we directly take the original sentences and the output results of other systems as input and then obtain the filtered sentences through LM-Combiner. Experiments on the FCGEC dataset show that our proposed method effectively alleviates the over-correction of the original system (+18.2 Precision) while ensuring the error recall remains unchanged. Besides, we find that LM-Combiner still has a good rewriting performance even with small parameters and few training data, and thus can cost-effectively mitigate the over-correction of black-box GEC systems (e.g., ChatGPT).

correction, error correction, lm-combiner, (14 more...)

arXiv.org Artificial Intelligence

2403.17413

Country:

Asia > China > Inner Mongolia > Hohhot (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems

Raina, Vyas, Gales, Mark

arXiv.org Artificial IntelligenceOct-27-2023

For sequence-to-sequence tasks it is challenging to combine individual system outputs. Further, there is also often a mismatch between the decoding criterion and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used to combine system outputs in a manner that encourages better alignment with the final assessment criterion. This paper examines MBR decoding for Grammatical Error Correction (GEC) systems, where performance is usually evaluated in terms of edits and an associated F-score. Hence, we propose a novel MBR loss function directly linked to this form of criterion. Furthermore, an approach to expand the possible set of candidate sentences is described. This builds on a current max-voting combination scheme, as well as individual edit-level selection. Experiments on three popular GEC datasets and with state-of-the-art GEC systems demonstrate the efficacy of the proposed MBR approach. Additionally, the paper highlights how varying reward metrics within the MBR decoding framework can provide control over precision, recall, and the F-score in combined GEC systems.

computational linguistic, mbr, sequence, (12 more...)

arXiv.org Artificial Intelligence

2309.0652

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
(3 more...)

Add feedback

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Ye, Jingheng, Li, Yinghui, Zhou, Qingyu, Li, Yangning, Ma, Shirong, Zheng, Hai-Tao, Shen, Ying

arXiv.org Artificial IntelligenceOct-17-2023

Evaluating the performance of Grammatical Error Correction (GEC) systems is a challenging task due to its subjectivity. Designing an evaluation metric that is as objective as possible is crucial to the development of GEC task. However, mainstream evaluation metrics, i.e., reference-based metrics, introduce bias into the multi-reference evaluation by extracting edits without considering the presence of multiple references. To overcome this issue, we propose Chunk-LEvel Multi-reference Evaluation (CLEME), designed to evaluate GEC systems in the multi-reference evaluation setting. CLEME builds chunk sequences with consistent boundaries for the source, the hypothesis and references, thus eliminating the bias caused by inconsistent edit boundaries. Furthermore, we observe the consistent boundary could also act as the boundary of grammatical errors, based on which the F$_{0.5}$ score is then computed following the correction independence assumption. We conduct experiments on six English reference sets based on the CoNLL-2014 shared task. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of CLEME. Further analysis reveals that CLEME is robust to evaluate GEC systems across reference sets with varying numbers of references and annotation style.

computational linguistic, correction, metric, (15 more...)

arXiv.org Artificial Intelligence

2305.10819

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Maryland > Baltimore (0.04)
(17 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.83)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Evaluation of really good grammatical error correction

Östling, Robert, Gillholm, Katarina, Kurfalı, Murathan, Mattson, Marie, Wirén, Mats

arXiv.org Artificial IntelligenceAug-17-2023

Although rarely stated, in practice, Grammatical Error Correction (GEC) encompasses various models with distinct objectives, ranging from grammatical error detection to improving fluency. Traditional evaluation methods fail to fully capture the full range of system capabilities and objectives. Reference-based evaluations suffer from limitations in capturing the wide variety of possible correction and the biases introduced during reference creation and is prone to favor fixing local errors over overall text improvement. The emergence of large language models (LLMs) has further highlighted the shortcomings of these evaluation strategies, emphasizing the need for a paradigm shift in evaluation methodology. In the current study, we perform a comprehensive evaluation of various GEC systems using a recently published dataset of Swedish learner texts. The evaluation is performed using established evaluation metrics as well as human judges. We find that GPT-3 in a few-shot setting by far outperforms previous grammatical error correction systems for Swedish, a language comprising only 0.11% of its training data. We also found that current evaluation methods contain undesirable biases that a human evaluation is able to reveal. We suggest using human post-editing of GEC system outputs to analyze the amount of change required to reach native-level human performance on the task, and provide a dataset annotated with human post-edits and assessments of grammaticality, fluency and meaning preservation of GEC system outputs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.08982

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
(15 more...)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
(2 more...)

Add feedback